Student’s t distribution (t)#

The Student’s t distribution is a continuous distribution on (\mathbb{R}) that looks normal near the center but has heavier tails.

It shows up in two complementary ways:

  • as the sampling distribution of the t-statistic when the variance is unknown

  • as a practical robust noise model (a heavy-tailed alternative to a Gaussian)

Learning goals#

By the end, you should be able to:

  • write down the PDF/CDF and understand the role of degrees of freedom

  • explain where the t distribution comes from (normal + chi-square)

  • compute key properties (when moments exist, entropy, characteristic function)

  • sample from it using NumPy only

  • use scipy.stats.t for evaluation, simulation, and fitting

import platform

import numpy as np

import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

import scipy
from scipy import optimize, special, stats
from scipy.stats import norm
from scipy.stats import t as t_dist

# Plotly rendering (CKC convention)
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

# Reproducibility
rng = np.random.default_rng(42)
np.set_printoptions(precision=6, suppress=True)

VERSIONS = {
    "python": platform.python_version(),
    "numpy": np.__version__,
    "scipy": scipy.__version__,
    "plotly": plotly.__version__,
}

VERSIONS
{'python': '3.12.9', 'numpy': '1.26.2', 'scipy': '1.15.0', 'plotly': '6.5.2'}

1) Title & Classification#

  • Name: t (Student’s t distribution; SciPy: scipy.stats.t)

  • Type: Continuous

  • Support (standard form): (x \in (-\infty, \infty))

  • Parameter space (standard form): degrees of freedom (\nu > 0)

  • SciPy location/scale: loc \in \mathbb{R}, scale > 0 with

[ X = \mathrm{loc} + \mathrm{scale},T, \qquad T \sim t_{\nu}. ]

Unless stated otherwise, we work with the standard t distribution (loc=0, scale=1).

2) Intuition & Motivation#

What it models#

A t distribution behaves like a normal distribution near its center, but its tails decay only polynomially. This makes it a good model for occasional large deviations.

A canonical generative story is:

  1. Draw a standard normal (Z \sim \mathcal{N}(0,1))

  2. Draw an independent chi-square (V \sim \chi^2_{\nu})

  3. Form the ratio

[ T = \frac{Z}{\sqrt{V/\nu}} \sim t_{\nu}. ]

The random denominator inflates or deflates the normal draw, producing heavier tails.

Typical real-world use cases#

  • Small-sample inference: the t-statistic ((\bar X-\mu_0)/(S/\sqrt{n})) follows a t distribution under normality.

  • Robust modeling: t likelihoods reduce the influence of outliers (common in finance, sensor data, web metrics).

  • Bayesian modeling: t priors/likelihoods appear as normal–gamma mixtures; posterior predictive distributions are often t.

Relations to other distributions#

  • As (\nu \to \infty), (t_{\nu}) converges to (\mathcal{N}(0,1)).

  • (t_1) is exactly the standard Cauchy distribution.

  • If (T \sim t_{\nu}), then (T^2 \sim F_{1,\nu}) (F distribution).

  • The t distribution is a scale mixture of normals: if (\lambda \sim \mathrm{Gamma}(\nu/2,\ \nu/2)) (shape–rate), then

[ T\mid\lambda \sim \mathcal{N}(0, 1/\lambda), \quad\Rightarrow\quad T \sim t_{\nu}. ]

3) Formal Definition#

Let (T \sim t_{\nu}) be a standard Student’s t random variable with degrees of freedom (\nu>0).

PDF#

[ f(t;\nu) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi},\Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}}, \qquad t \in \mathbb{R}. ]

For the location–scale family (X = \mathrm{loc} + \mathrm{scale},T) with (\mathrm{scale}>0), the density is

[ f_X(x;\nu,\mathrm{loc},\mathrm{scale}) = \frac{1}{\mathrm{scale}},f!\left(\frac{x-\mathrm{loc}}{\mathrm{scale}};\nu\right). ]

CDF#

The CDF can be written using the regularized incomplete beta function (I_x(a,b)). Let

[ w(t) = \frac{\nu}{\nu + t^2} \in (0,1]. ]

Then for (t\ge 0):

[ F(t;\nu) = 1 - \tfrac{1}{2} I_{w(t)}!\left(\tfrac{\nu}{2}, \tfrac{1}{2}\right), ]

and by symmetry for (t<0):

[ F(t;\nu) = \tfrac{1}{2} I_{w(t)}!\left(\tfrac{\nu}{2}, \tfrac{1}{2}\right). ]

def _validate_df_loc_scale(df: float, loc: float, scale: float) -> tuple[float, float, float]:
    df = float(df)
    loc = float(loc)
    scale = float(scale)
    if not (df > 0.0):
        raise ValueError("df (ν) must be > 0")
    if not (scale > 0.0):
        raise ValueError("scale must be > 0")
    return df, loc, scale


def t_logpdf(x: np.ndarray, df: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
    '''Log-PDF of the Student t distribution with df, loc, scale.

    Uses stable log-gamma expressions.
    '''

    df, loc, scale = _validate_df_loc_scale(df, loc, scale)
    x = np.asarray(x, dtype=float)
    z = (x - loc) / scale

    log_norm = (
        special.gammaln((df + 1.0) / 2.0)
        - special.gammaln(df / 2.0)
        - 0.5 * np.log(df * np.pi)
        - np.log(scale)
    )

    return log_norm - ((df + 1.0) / 2.0) * np.log1p((z * z) / df)


def t_pdf(x: np.ndarray, df: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
    return np.exp(t_logpdf(x, df=df, loc=loc, scale=scale))


def t_cdf(x: np.ndarray, df: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
    '''CDF via the regularized incomplete beta representation.'''

    df, loc, scale = _validate_df_loc_scale(df, loc, scale)
    x = np.asarray(x, dtype=float)
    z = (x - loc) / scale

    w = df / (df + z * z)
    ib = special.betainc(df / 2.0, 0.5, w)

    return np.where(z >= 0.0, 1.0 - 0.5 * ib, 0.5 * ib)
# Sanity checks against SciPy

df = 7.5
loc = -0.3
scale = 1.7

x = np.linspace(-6, 6, 501)

assert np.allclose(t_pdf(x, df, loc=loc, scale=scale), t_dist.pdf(x, df, loc=loc, scale=scale), rtol=1e-10, atol=0)
assert np.allclose(t_cdf(x, df, loc=loc, scale=scale), t_dist.cdf(x, df, loc=loc, scale=scale), rtol=1e-10, atol=0)

"ok"
'ok'

4) Moments & Properties#

Mean / variance / skewness / kurtosis#

Let (T\sim t_{\nu}) (standard, centered).

  • Mean: (\mathbb{E}[T]=0) if (\nu>1); undefined for (\nu\le 1).

  • Variance: (\mathrm{Var}(T)=\dfrac{\nu}{\nu-2}) if (\nu>2); infinite for (1<\nu\le 2).

  • Skewness: 0 if (\nu>3); undefined otherwise.

  • Excess kurtosis: (\dfrac{6}{\nu-4}) if (\nu>4); infinite for (2<\nu\le 4).

For the location–scale family (X = \mathrm{loc} + \mathrm{scale},T), when the mean/variance exist:

[ \mathbb{E}[X]=\mathrm{loc}, \qquad \mathrm{Var}(X)=\mathrm{scale}^2,\frac{\nu}{\nu-2}. ]

MGF and characteristic function#

  • The moment generating function (M_T(t)=\mathbb{E}[e^{tT}]) does not exist (is infinite) for any nonzero (t). Intuition: t tails behave like (|x|^{-(\nu+1)}), and (e^{t x}) dominates any polynomial as (x\to\infty).

  • The characteristic function exists for all (t) and can be written using the modified Bessel function (K_{\alpha}):

[ \varphi_T(t) = \mathbb{E}[e^{i t T}] = \frac{(\sqrt{\nu}|t|)^{\nu/2},K_{\nu/2}(\sqrt{\nu}|t|)}{2^{\nu/2-1},\Gamma(\nu/2)}. ]

Entropy#

The differential entropy of the standard t distribution is

[ H(T) = \log\bigl(\sqrt{\nu},B(\nu/2, 1/2)\bigr)

  • \frac{\nu+1}{2}\left[\psi\left(\frac{\nu+1}{2}\right) - \psi\left(\frac{\nu}{2}\right)\right], ]

where (B) is the beta function and (\psi) is the digamma function. For (X=\mathrm{loc}+\mathrm{scale},T),

[ H(X)=H(T)+\log(\mathrm{scale}). ]

def t_moments(df: float, loc: float = 0.0, scale: float = 1.0) -> dict:
    df, loc, scale = _validate_df_loc_scale(df, loc, scale)

    mean = np.nan
    var = np.nan
    skew = np.nan
    excess_kurt = np.nan

    if df > 1:
        mean = loc
    if df > 2:
        var = (scale * scale) * df / (df - 2.0)
    elif df > 1:
        var = np.inf

    if df > 3:
        skew = 0.0

    if df > 4:
        excess_kurt = 6.0 / (df - 4.0)
    elif df > 2:
        excess_kurt = np.inf

    return {
        "df": df,
        "loc": loc,
        "scale": scale,
        "mean": mean,
        "var": var,
        "skew": skew,
        "excess_kurt": excess_kurt,
        "kurt": excess_kurt + 3.0 if np.isfinite(excess_kurt) else excess_kurt,
    }


def t_entropy(df: float, scale: float = 1.0) -> float:
    df, _, scale = _validate_df_loc_scale(df, 0.0, scale)
    a = df / 2.0
    term1 = 0.5 * np.log(df) + special.betaln(a, 0.5)
    term2 = (df + 1.0) / 2.0 * (special.digamma((df + 1.0) / 2.0) - special.digamma(a))
    return float(term1 + term2 + np.log(scale))


def t_charfun(t: np.ndarray, df: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
    '''Characteristic function of Student-t (df, loc, scale).'''

    df, loc, scale = _validate_df_loc_scale(df, loc, scale)

    t = np.asarray(t, dtype=float)
    a = df / 2.0
    x = np.sqrt(df) * np.abs(scale * t)

    out = np.empty_like(x, dtype=float)
    mask0 = x == 0.0
    out[mask0] = 1.0

    xm = x[~mask0]
    log_const = -(a - 1.0) * np.log(2.0) - special.gammaln(a)
    out[~mask0] = np.exp(log_const + a * np.log(xm) + np.log(special.kv(a, xm)))

    return np.exp(1j * loc * t) * out


df_demo = 5
{
    "moments": t_moments(df_demo),
    "entropy": t_entropy(df_demo),
}
{'moments': {'df': 5.0,
  'loc': 0.0,
  'scale': 1.0,
  'mean': 0.0,
  'var': 1.6666666666666667,
  'skew': 0.0,
  'excess_kurt': 6.0,
  'kurt': 9.0},
 'entropy': 1.627502672414396}
# Characteristic function: closed form vs Monte Carlo estimate (memory-safe)

df = 5
n = 80_000
samples = t_dist.rvs(df, size=n, random_state=rng)

t_grid = np.linspace(0, 15, 300)
phi_formula = np.real(t_charfun(t_grid, df=df))
phi_mc = np.array([np.real(np.mean(np.exp(1j * t0 * samples))) for t0 in t_grid])

fig = go.Figure()
fig.add_trace(go.Scatter(x=t_grid, y=phi_formula, mode="lines", name="closed form"))
fig.add_trace(go.Scatter(x=t_grid, y=phi_mc, mode="lines", name="Monte Carlo", line=dict(dash="dot")))
fig.update_layout(
    title="Characteristic function of t distribution (df=5)",
    xaxis_title="t",
    yaxis_title="Re φ(t)",
    width=900,
    height=420,
)
fig

5) Parameter Interpretation#

The t distribution has three knobs in SciPy: (df, loc, scale).

  • df ((\nu), degrees of freedom) controls tail heaviness.

    • small (\nu) ⇒ very heavy tails and unstable sample means

    • (\nu\to\infty) ⇒ approaches the standard normal

  • loc shifts the distribution (it is the center of symmetry, and the mean when it exists).

  • scale stretches the distribution.

A practical way to compare tails is via the decay rate:

[ f(t;\nu) \propto |t|^{-(\nu+1)}\quad\text{for large }|t|. ]

So each additional degree of freedom makes the tail a little lighter.

x = np.linspace(-6, 6, 2000)

dfs = [1, 2, 5, 10, 30]

fig = go.Figure()
for df in dfs:
    fig.add_trace(go.Scatter(x=x, y=t_pdf(x, df=df), mode="lines", name=f"t(df={df})"))

fig.add_trace(go.Scatter(x=x, y=norm.pdf(x), mode="lines", name="N(0,1)", line=dict(color="black", dash="dash")))

fig.update_layout(
    title="PDF shape changes: degrees of freedom ν",
    xaxis_title="x",
    yaxis_title="density",
    width=900,
    height=420,
)
fig
# Effect of loc and scale
x = np.linspace(-10, 10, 2500)
params = [
    (5, 0.0, 1.0),
    (5, 2.0, 1.0),
    (5, 0.0, 2.0),
    (30, 0.0, 1.0),
]

fig = go.Figure()
for df, loc, scale in params:
    fig.add_trace(go.Scatter(x=x, y=t_pdf(x, df=df, loc=loc, scale=scale), mode="lines", name=f"df={df}, loc={loc}, scale={scale}"))

fig.update_layout(
    title="Location/scale effects",
    xaxis_title="x",
    yaxis_title="density",
    width=900,
    height=420,
)
fig

6) Derivations#

6.1 Expectation#

For the standard t distribution, the PDF is symmetric: (f(t;\nu)=f(-t;\nu)). So the integral of (t f(t;\nu)) over symmetric limits is 0.

However, a mean exists only if the integral of (|t| f(t;\nu)) is finite. Because the tails behave like (|t|^{-(\nu+1)}),

[ \int^{\infty} |t| f(t;\nu),dt\ \text{converges iff}\ \nu>1. ]

Thus:

  • if (\nu>1), (\mathbb{E}[T]=0)

  • if (\nu\le 1), the mean is undefined

6.2 Variance#

Use the ratio representation (T = Z/\sqrt{V/\nu}) with (Z\sim\mathcal{N}(0,1)), (V\sim\chi^2_{\nu}), independent. Then

[ T^2 = \frac{Z^2}{V/\nu} = \nu,\frac{Z^2}{V}. ]

If (\nu>2), the variance exists and

[ \mathbb{E}[T^2] = \nu,\mathbb{E}[Z^2] \mathbb{E}[1/V] = \nu\cdot 1 \cdot \frac{1}{\nu-2} = \frac{\nu}{\nu-2}. ]

(The identity (\mathbb{E}[1/V]=1/(\nu-2)) holds for (V\sim\chi^2_{\nu}) and (\nu>2).)

6.3 Likelihood#

Given observations (x_1,\dots,x_n) modeled as i.i.d.

[ X_i \sim t_{\nu}(\mathrm{loc},\mathrm{scale}), ]

the log-likelihood is

[ \ell(\nu,\mathrm{loc},\mathrm{scale}) = \sum_{i=1}^n \log f_X(x_i;\nu,\mathrm{loc},\mathrm{scale}), ]

with

[ \log f_X(x_i) = \log\Gamma\left(\frac{\nu+1}{2}\right)

  • \log\Gamma\left(\frac{\nu}{2}\right)

  • \tfrac12\log(\nu\pi)

  • \log(\mathrm{scale})

  • \frac{\nu+1}{2}\log\left(1 + \frac{1}{\nu}\left(\frac{x_i-\mathrm{loc}}{\mathrm{scale}}\right)^2\right). ]

This is what MLE routines optimize (including scipy.stats.t.fit).

def t_loglik(x: np.ndarray, df: float, loc: float, scale: float) -> float:
    return float(np.sum(t_logpdf(x, df=df, loc=loc, scale=scale)))


def t_nll(theta: np.ndarray, x: np.ndarray) -> float:
    df, loc, log_scale = float(theta[0]), float(theta[1]), float(theta[2])
    scale = float(np.exp(log_scale))
    # Penalize invalid df directly (optimizer-friendly).
    if df <= 0:
        return np.inf
    return -t_loglik(x, df=df, loc=loc, scale=scale)


# Quick demo: optimize (df, loc, scale) on simulated data
true = {"df": 6.0, "loc": 1.5, "scale": 0.8}
x = t_dist.rvs(true["df"], loc=true["loc"], scale=true["scale"], size=3_000, random_state=rng)

theta0 = np.array([10.0, np.median(x), np.log(np.std(x) + 1e-9)])
res = optimize.minimize(t_nll, theta0, args=(x,), method="Nelder-Mead")

df_hat, loc_hat, scale_hat = float(res.x[0]), float(res.x[1]), float(np.exp(res.x[2]))
{ "true": true, "mle_hat": {"df": df_hat, "loc": loc_hat, "scale": scale_hat} }
{'true': {'df': 6.0, 'loc': 1.5, 'scale': 0.8},
 'mle_hat': {'df': 5.6380083522442455,
  'loc': 1.4939574148670296,
  'scale': 0.7853995473987132}}

7) Sampling & Simulation (NumPy-only)#

Algorithm#

Use the defining ratio:

[ T = \frac{Z}{\sqrt{V/\nu}}, \qquad Z\sim\mathcal{N}(0,1),\ \ V\sim\chi^2_{\nu},\ \ Z\perp V. ]

To sample (V \sim \chi^2_{\nu}) for general (non-integer) (\nu), use the gamma representation:

[ V \sim \chi^2_{\nu} \iff V \sim \mathrm{Gamma}(k=\nu/2,\ \theta=2) ]

(shape (k), scale (\theta)).

Then return (X=\mathrm{loc}+\mathrm{scale},T).

def t_rvs_numpy(
    df: float,
    loc: float = 0.0,
    scale: float = 1.0,
    size: int | tuple[int, ...] = 1,
    rng: np.random.Generator | None = None,
) -> np.ndarray:
    '''NumPy-only sampler for Student-t via Normal / Chi-square ratio.'''

    df, loc, scale = _validate_df_loc_scale(df, loc, scale)
    rng = np.random.default_rng() if rng is None else rng

    z = rng.standard_normal(size=size)
    v = rng.gamma(shape=df / 2.0, scale=2.0, size=size)  # Chi-square(df)

    t = z / np.sqrt(v / df)
    return loc + scale * t


# Monte Carlo check

df = 8
n = 400_000
samples = t_rvs_numpy(df, size=n, rng=rng)

mc_mean = float(np.mean(samples))
mc_var = float(np.var(samples))

theory = t_moments(df)
(mc_mean, theory["mean"], mc_var, theory["var"]) 
(-0.0026917757260773163, 0.0, 1.3319992347178284, 1.3333333333333333)

8) Visualization#

We’ll visualize:

  • PDFs for multiple degrees of freedom (tail heaviness)

  • CDFs (how quickly probability accumulates)

  • Monte Carlo samples vs the theoretical PDF

# PDF + CDF side-by-side
x = np.linspace(-6, 6, 2000)
dfs = [1, 2, 5, 10, 30]

fig = make_subplots(rows=1, cols=2, subplot_titles=("PDF", "CDF"))

for df in dfs:
    fig.add_trace(go.Scatter(x=x, y=t_pdf(x, df=df), mode="lines", name=f"df={df}"), row=1, col=1)
    fig.add_trace(go.Scatter(x=x, y=t_cdf(x, df=df), mode="lines", name=f"df={df}", showlegend=False), row=1, col=2)

fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_yaxes(title_text="density", row=1, col=1)
fig.update_xaxes(title_text="x", row=1, col=2)
fig.update_yaxes(title_text="F(x)", row=1, col=2)

fig.update_layout(title="Student-t: PDF and CDF for different ν", width=1000, height=420)
fig
# Monte Carlo histogram vs theoretical PDF

df = 4
n = 200_000
samples = t_rvs_numpy(df, size=n, rng=rng)

x = np.linspace(-8, 8, 2500)

fig = go.Figure()
fig.add_trace(go.Histogram(x=samples, nbinsx=120, histnorm="probability density", name="samples", opacity=0.55))
fig.add_trace(go.Scatter(x=x, y=t_pdf(x, df=df), mode="lines", name="theoretical PDF", line=dict(color="black")))

fig.update_layout(
    title=f"Monte Carlo samples vs PDF (df={df})",
    xaxis_title="x",
    yaxis_title="density",
    width=900,
    height=420,
)
fig

9) SciPy Integration (scipy.stats.t)#

SciPy provides a full implementation of the t distribution:

  • t_dist.pdf(x, df, loc=..., scale=...)

  • t_dist.cdf(x, df, loc=..., scale=...)

  • t_dist.rvs(df, loc=..., scale=..., size=..., random_state=...)

  • t_dist.fit(data) for MLE estimation of (df, loc, scale)

We’ll use it both as a reference implementation and for fitting.

df, loc, scale = 6.0, 1.2, 0.7
x = np.array([-2.0, -1.0, 0.0, 1.0, 2.0])

pdf_vals = t_dist.pdf(x, df, loc=loc, scale=scale)
cdf_vals = t_dist.cdf(x, df, loc=loc, scale=scale)

samples = t_dist.rvs(df, loc=loc, scale=scale, size=20_000, random_state=rng)
fit_df, fit_loc, fit_scale = t_dist.fit(samples)

{
    "pdf": pdf_vals,
    "cdf": cdf_vals,
    "fit": {"df": float(fit_df), "loc": float(fit_loc), "scale": float(fit_scale)},
}
{'pdf': array([0.002866, 0.018138, 0.135473, 0.521502, 0.274424]),
 'cdf': array([0.001902, 0.009998, 0.068652, 0.392352, 0.851679]),
 'fit': {'df': 5.833234442641876,
  'loc': 1.198612165986697,
  'scale': 0.7063928901680383}}

10) Statistical Use Cases#

10.1 Hypothesis testing (why the t distribution exists)#

If (X_1,\dots,X_n) are i.i.d. (\mathcal{N}(\mu,\sigma^2)), then the t statistic

[ T = \frac{\bar X - \mu}{S/\sqrt{n}} ]

follows (t_{n-1}), where (S) is the sample standard deviation. That is the mathematical justification for t tests and t confidence intervals.

10.2 Bayesian modeling#

Two common appearances:

  1. Posterior predictive: with a Normal–Inverse-Gamma prior for ((\mu,\sigma^2)), the posterior predictive distribution for a new observation is Student-t.

  2. Robust likelihood: using a t likelihood instead of Gaussian makes Bayesian inference less sensitive to outliers.

10.3 Generative modeling#

A Student-t noise model is a simple way to generate data with occasional big shocks while remaining symmetric and unimodal.

# Simulate the t-statistic under the null and compare to t_{n-1}

n = 12
n_rep = 200_000
mu0 = 0.0
sigma = 2.0

x = rng.normal(loc=mu0, scale=sigma, size=(n_rep, n))
x_bar = x.mean(axis=1)
s = x.std(axis=1, ddof=1)

t_stat = (x_bar - mu0) / (s / np.sqrt(n))

df = n - 1

grid = np.linspace(-6, 6, 2000)

fig = go.Figure()
fig.add_trace(go.Histogram(x=t_stat, nbinsx=140, histnorm="probability density", name="simulated t-stat", opacity=0.55))
fig.add_trace(go.Scatter(x=grid, y=t_dist.pdf(grid, df), mode="lines", name=f"t(df={df})", line=dict(color="black")))

fig.update_layout(
    title=f"t-statistic distribution under Normality (n={n} ⇒ df={df})",
    xaxis_title="t-stat",
    yaxis_title="density",
    width=900,
    height=420,
)
fig
# Bayesian posterior predictive (Normal data, unknown mean & variance)
# Prior: Normal-Inverse-Gamma (μ, σ^2)

# Synthetic data (with a couple outliers to make things interesting)
rng2 = np.random.default_rng(123)
y = rng2.normal(loc=1.0, scale=1.2, size=25)
y[[3, 17]] += np.array([5.0, -4.0])

# Prior hyperparameters (weakly informative)
mu0 = 0.0
kappa0 = 0.01
alpha0 = 2.0
beta0 = 2.0

n = y.size
y_bar = float(y.mean())
ssq = float(np.sum((y - y_bar) ** 2))

kappa_n = kappa0 + n
mu_n = (kappa0 * mu0 + n * y_bar) / kappa_n
alpha_n = alpha0 + n / 2.0
beta_n = beta0 + 0.5 * ssq + 0.5 * (kappa0 * n / kappa_n) * (y_bar - mu0) ** 2

# Posterior predictive is Student-t with:
# df = 2*alpha_n
# loc = mu_n
# scale^2 = beta_n * (kappa_n + 1) / (alpha_n * kappa_n)

df_pred = 2.0 * alpha_n
loc_pred = mu_n
scale_pred = float(np.sqrt(beta_n * (kappa_n + 1.0) / (alpha_n * kappa_n)))

x = np.linspace(loc_pred - 6 * scale_pred, loc_pred + 6 * scale_pred, 2500)

pred_pdf = t_dist.pdf(x, df_pred, loc=loc_pred, scale=scale_pred)
plug_in_pdf = norm.pdf(x, loc=y_bar, scale=float(np.std(y, ddof=1)))

fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=pred_pdf, mode="lines", name=f"posterior predictive t (df={df_pred:.1f})"))
fig.add_trace(go.Scatter(x=x, y=plug_in_pdf, mode="lines", name="plug-in Normal", line=dict(dash="dash")))
fig.add_trace(go.Histogram(x=y, nbinsx=20, histnorm="probability density", name="data", opacity=0.4))

fig.update_layout(
    title="Posterior predictive: Student-t vs plug-in Normal",
    xaxis_title="y",
    yaxis_title="density",
    width=900,
    height=450,
)
fig

11) Pitfalls#

  • Invalid parameters: df must be (>0), scale must be (>0).

  • Nonexistent moments: the mean/variance/kurtosis require ( u>1), ( u>2), ( u>4) respectively. With small ( u), sample means and variances can be extremely unstable.

  • Numerical issues:

    • for extreme (|x|) or small ( u), computing the PDF directly can underflow; prefer logpdf.

    • CDF tails are best computed by special functions (SciPy does this well).

  • Parameterization confusion: many texts define the t distribution only via ( u), but SciPy uses a location–scale family (df, loc, scale).

12) Summary#

  • The Student’s t distribution (t_{\nu}) is a continuous, symmetric, heavy-tailed distribution on (\mathbb{R}).

  • It arises as (Z/\sqrt{V/\nu}) (normal divided by chi-square scale), explaining its role in t tests.

  • df=1 gives the Cauchy; df\to\infty approaches the normal.

  • Moments exist only above thresholds: mean ((\nu>1)), variance ((\nu>2)), kurtosis ((\nu>4)).

  • A NumPy-only sampler is easy via the ratio representation; SciPy provides a full-featured implementation and MLE fitting.